我们考虑了状态断层扫描的经典问题:给定未知量子状​​态$ \ rho \ in \ mathbb {c}^{d \ times d} $的副本,输出$ \ wideHat {\ rho} $ rho- \ wideHat {\ rho} \ | _ {\ Mathsf {tr}}} \ le \ varepsilon $。当一个允许在所有副本上纠缠的连贯测量值时,$ \ theta(d^2/\ varepsilon^2)$副本是必要且足够的[Haah等。 '17,O'Donnell-Wright '16]。不幸的是,达到此速率的协议会产生大量的量子内存开销,从而阻止了当前或近期设备上的实现。另一方面,使用不连贯的(单拷贝)测量的最著名协议使用$ o(d^3/\ varepsilon^2)$副本[Kueng-rauhut-terstiege '17]开放问题以了解此速度是否紧张。在这项工作中,我们通过证明任何使用不一致测量的协议(即使适应性地选择它们)需要$ \ omega(d^3/\ varepsilon^2)$副本,与[kueng的上限匹配[kueng -rauhut-terstiege '17]。我们通过一种新的证明技术来做到这一点,该技术在测量后直接界定后验分布的“倾斜”,这给出了我们下限的简短简短证明,我们认为这可能是独立的。
translated by 谷歌翻译
域泛化旨在通过来自有限数量的培训环境的数据表现良好。尽管这项任务提出了提案算法,但理论上和经验仍然非常具有挑战性的评估其表现。分类匹配算法,如(条件)域对抗网络[Ganin等,2016,Long等人,2018]是流行的,享受经验的成功,但缺乏正式的保证。其他诸如不变风险最小化(IRM)的方法需要一定大量的大量培训环境 - 在虚假的特征空间的维度中,即使在[Rosenfeld等人是否提出的简单数据模型, 2021]。在该模型的变种下,我们表明,ERM和IRM都不能以$ O(d_s)$环境概括。然后,我们提出了一种迭代特征匹配算法,其保证具有高概率,以产生推广在仅看到$ O(\ log d_s)$环境之后推广的预测器。我们的结果为在具体的非竞争数据模型下,广泛使用的分销匹配算法系列提供了第一理论理由。
translated by 谷歌翻译
Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a partial theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires $d$ times more parameters than mere interpolation, where $d$ is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.
translated by 谷歌翻译
我们考虑激励探索:一种多臂匪徒的版本,其中武器的选择由自私者控制,而算法只能发布建议。该算法控制信息流,信息不对称可以激励代理探索。先前的工作达到了最佳的遗憾率,直到乘法因素,这些因素根据贝叶斯先验而变得很大,并在武器数量上成倍规模扩展。采样每只手臂的一个更基本的问题一旦遇到了类似的因素。我们专注于激励措施的价格:出于激励兼容的目的,绩效的损失,广泛解释为。我们证明,如果用足够多的数据点初始化,则标准的匪徒汤普森采样是激励兼容的。因此,当收集这些数据点时,由于激励措施的绩效损失仅限于初始回合。这个问题主要降低到样本复杂性的问题:需要多少个回合?我们解决了这个问题,提供了匹配的上限和下限,并在各种推论中实例化。通常,最佳样品复杂性在“信念强度”中的武器数量和指数中是多项式。
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译
This paper presents a novel framework for planning in unknown and occluded urban spaces. We specifically focus on turns and intersections where occlusions significantly impact navigability. Our approach uses an inpainting model to fill in a sparse, occluded, semantic lidar point cloud and plans dynamically feasible paths for a vehicle to traverse through the open and inpainted spaces. We demonstrate our approach using a car's lidar data with real-time occlusions, and show that by inpainting occluded areas, we can plan longer paths, with more turn options compared to without inpainting; in addition, our approach more closely follows paths derived from a planner with no occlusions (called the ground truth) compared to other state of the art approaches.
translated by 谷歌翻译
Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforcement learning based algorithms. In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search. With Proximal Policy Optimization and Deep Q-Network algorithms as benchmark, we show the effectiveness of our proposed approach with experimental study.
translated by 谷歌翻译
The celebrated proverb that "speech is silver, silence is golden" has a long multinational history and multiple specific meanings. In written texts punctuation can in fact be considered one of its manifestations. Indeed, the virtue of effectively speaking and writing involves - often decisively - the capacity to apply the properly placed breaks. In the present study, based on a large corpus of world-famous and representative literary texts in seven major Western languages, it is shown that the distribution of intervals between consecutive punctuation marks in almost all texts can universally be characterised by only two parameters of the discrete Weibull distribution which can be given an intuitive interpretation in terms of the so-called hazard function. The values of these two parameters tend to be language-specific, however, and even appear to navigate translations. The properties of the computed hazard functions indicate that among the studied languages, English turns out to be the least constrained by the necessity to place a consecutive punctuation mark to partition a sequence of words. This may suggest that when compared to other studied languages, English is more flexible, in the sense of allowing longer uninterrupted sequences of words. Spanish reveals similar tendency to only a bit lesser extent.
translated by 谷歌翻译
This report summarizes the 3rd International Verification of Neural Networks Competition (VNN-COMP 2022), held as a part of the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), which was collocated with the 34th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2022 iteration, 11 teams participated on a diverse set of 12 scored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
translated by 谷歌翻译
Automatic machine translation (MT) metrics are widely used to distinguish the translation qualities of machine translation systems across relatively large test sets (system-level evaluation). However, it is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level (segment-level evaluation). In this paper, we investigate how useful MT metrics are at detecting the success of a machine translation component when placed in a larger platform with a downstream task. We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks (dialogue state tracking, question answering, and semantic parsing). For each task, we only have access to a monolingual task-specific model. We calculate the correlation between the metric's ability to predict a good/bad translation with the success/failure on the final task for the Translate-Test setup. Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes. We also find that the scores provided by neural metrics are not interpretable mostly because of undefined ranges. Our analysis suggests that future MT metrics be designed to produce error labels rather than scores to facilitate extrinsic evaluation.
translated by 谷歌翻译